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Remarks 

The Prevent Invention and the Pending Claims 

The present invention relates generally to the Held of speech recognition. More 
particularly, the invention discloses a technique for disambiguating speech input using 
one of voice mode interaction, visual mode interaction, or a combination of voice mode 
and visual mode interaction. 

Claims 1.4-5, 7-8. and 1 1-14 are currently pending. Reconsideration and allowance of 
Ihc pending claims is respectfully requested. 

Summary of the Office Action 

( laims 1.4. 7-8, 1 1 and 14 rejected under 35 U.S-C- 103(a) by Lai ct ah (USPN 
6,006,183) referred to as l<ai hereinafter in view of Duan et al. (US Patent No* 
6,223 J 50) referred to as Duan hereinafter. 

(Maims 5 and 12-13 are rejected under 35 (JSC. 103(a) as being unpatentable over Lai in 
view ol Haddock et al. (USPN 5,265,014) referred to as Haddock hereinafter. 

Amendments To The Claims 

( Maims I and 1 1 arc currently amended. Support for the amendments in claims 1 
and ! I are found at paragraphs [0017], [002 1 J, [0022], [0024], (0025], [0027], and 
1 0028]. 

The office action states: "Claims 1,4, 7-8, 1 1 and 14 rejected under 35 U.S.C 
103(;») by I ,ai et bL (USPN 6,006,183) referred to as Lai hereinafter in view of Duan et 
al. (US Patent No. 6,223,150) referred to as Duan hereinafter » 

l 'trst , Lai in view of Duan, does not teach or suggest all the claim limitations . The 
applicant's invention discloses an options and parameters 114 component for setting 
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parameters required for controlling the disambiguation mechanism by the user and the 
application (see paragraphs [0017], [0021], [0022], [0024]; [0025]; and Fig 1; paragraph 
[0024] recites: "The end user 108 and the application 106 can both set parameters 1 14 to 
control the sub-components of the MDKT )- The limitation: "receiving parameters from a 
user and the application for controlling the speech disambiguation mechanism, wherein 
both the user and the application can set the parameters to control said mechanism. . in 
claim 1 and 1 1 is not found in Lai or Duan. In contrast to the setting of parameters by 
both the user and the application to control the speech disambiguation mechanism in 
applicant's invention, neither Lai or Duan teach setting of parameters bv the application , 
Lai teaches assigning a confidence level score by a confidence level scorer 200 of the 
speech engine 160 (Lai, col. 3, lines 29-30); enabling a user of the system to select score 
thresholds (Lai, col, 3, lines 37-42); and allowing the user application to accept 
information from the user control (Lai, col. 4, lines 11-15). 

Furthermore, the Office Action states: "Lai fails to specifically disclose a plurality 
of alternatives are presented to a user for selection. In an analogous art Duan discloses 
providing a user with at least two possible tokens or word disambiguation alternatives. . 
In response, claim 1 has been amended to recite the limitation: 

"a selection component that identifies, according to a selection algorithm, which 
two or more tokens generated without translation of the language in which the speech, 
audio or combination of speech and audio input is received and presents said tokens as 
alternatives to the user .." 

Dunn does not teach this limitation. Accordingly, Dunn cannot be combined with Lai to 
arrive at the claimed invention. 

Furthermore, applicant discloses presenting the alternatives to the user in one of 
voice mode, visual mode, or a combination of voice and visual mode, and receiving a 
selection of an alternative from the user in one of voice mode, visual mode, or a 
combination of voice mode and visual mode. In contrast, neither Lai or Dunn teach 
presenting alternatives to the user in a "a combination of voice and visual mode", and 
"receiving a selection of an alternative by the user from the plurality of alternatives 
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resented to the user in one of voice mode, visual mode, or a combination of voice mode 
and visual mode;". 

In summary, Lai in view of Duan does not teach the following limitations in claim 
land 11: 

"an options and parameters component for receiving user parameters and 
appUcatioB parameter* far controlling the sneech disambiguation mechanism, 
„,w.m hnth the use r and the aPDlic a rinn can set the parameters to control said 
mechanism. ..." in claim 1, and 



"a 



selection component that identifies, according to a selection algorithm, which 
two or more tokens generated with o ut translation of the language in which the 
s peech, audio or combinatio " " f gfw^Vi and audio input is received, and presents 
said tokens 3 3 alternative s to the user" in claiml . and 

"one or more disambiguation components that present the alternatives to the user 
in one of voice mode, visual mode, nr a combination of voice mode and visual 
mode, and receive an alternative selected by the user from the plurality of 
alternatives resented to the user in one of voice mode, visual mode, or a 
combination »f voice mode and visual mode" in claim 1, and 

"receiving user parameters and application parameters for controlling die speech 
disambiguation mechanism, wherein both the user and the application can set the 
parameters to control said mechanism, and wherein the parameters include 
confidence thresholds governing unambiguous recognition and close matches" in 
claim 1 1 

"selecting two or more tokens generated without translation of the language in 
which the speech, audio or combination of speech and audio input is received, and 
presenting said tokens as alternatives to the user" in claim 1 1 , and 
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"presenting the alternatives to the user in one of voice mode, visual mode, or a 
combination of voice and visual mode, and receiving a selection of an alternative 
by the user fi*m the plurali ty of alternatives resented to the user in one of voice 
mode, visual mode, or a combustion of voice mode and visual mode" in claim 
11. 

Therefore, Lai in view of Duan does not teach or suggest the limitations in 
amended claims 1 and 11. 

Furthermore, even if Lai and Duan are combined as suggested in the office action. 
the combination that results will not arrive at the claimed in vention. For example, Lai in 
view of Duan will be inoperable and unsuccessful for the purpose of arriving at the 
system or corresponding method steps in claim I and 11 respectively, because Lai in 
view of Dunn does not teach: 

- "an a pplication parameter for controlling the speech disambiguation 
mechanism", 

- "a speech recognition component that generates one or more tokens 
corresponding to the speech input without translation of the language in 
which the audio, speech, or a combination o f speech and audio innut is 
received and presents said tokens as alternatives to the user" . 

- "one or more disambiguation components that present the alternatives to the 

user in one of voice mode, visual mode, or a combination of voice mode and 
visual mode", and 

- " receives an alternative selected bv the user from the plurality of alternatives 

presented to the user in one of voice mode, visual mode, or a combination of 
voice mode and visual mode" . 

Furthermore, applicant respectfully submits that the Lai and Duan references that are 
sought to be combined are in non-analogous arts . Applicant's invention is a speech 
recognition system where tokens are generated based on recognition of the speech uttered 
by the user in a single language by a speech recognizer (see paragaph [001 1]). In 



9 

PAGE 10/14 ' RCVD AT 7/21/2008 4:42:05 PM [Eastern Daylight rrnie] * SVR:USPT0^ FXRF-6J28 * DN1S:2738300 ^ CSID:8563740246 * DURATION (mm-ss):05.10 



07/21/2008 16:55 8563740246 



AJANKHA 



PAGE 11 



contrast Duan is a ^^translation system where tokens are generated based on 
translation o f a user's utterance fa™ a source lanmape to a target language (see Duan, 
col. 21, lines 9-13; and col. 9, lines 55-65). The method by which tokens are generated by 
a speech recognition system for speech uttered in a single language is vastly different and 
distinguishable from the method in Duan where tokens are generated by a language 
translation system that translates a user's utterance from a source language to a target 
language. A person of ordinary skill m the art would not likely look at a language translation 
system to find how to find how a plurality of alternative words can be generated and presented to 
a user, for use in a speech recognition system for the reason stated above. Therefore, applicant 
respectfully submits that the teachings of Lai and Duan may not be combined. 

Claims 4, 7-8 are dependent on claim 1 and further claim 14 is dependent on 
claim 1 1 . Since Lai and Duan does not teach, suggest, or motivate the limitations of claim 
1 and 1 1, applicant respectfully submits claims 4, 7-8, and 14 also to be novel over Lai 
and Duan. The applicant solicits reconsideration and allowance of claims 1, 4, 7-8, 11, 
and 14. 

The office action states: "Claims 5 and 12-13 are rejected under 35 U.S.C. 103(a) as 
being unpatentable over Lai in view of Haddock et at (USPN 5,265,014) referred to as 
Haddock hereinafter". 

Even if Lai and Haddock are combined as suggested by the Examiner, the 
combination that results will be inoperable for the purpose intended by claim 1 in 
applicant's invention, i.e., "an options and parameters component for receiving user 
parameters and application parameters for controlling the speech disambiguation 
mechanism, wherein both the user and the application can set the parameters to control 
said mechanism, and wherein the parameters include confidence thresholds governing 
unambiguous recognition and close matches". In applicant's invention, the parameters are 
set by both the user and the application. In contrast, neither Lai and Haddock teach or 
suggest setting of the parameters by both the user and the application. 
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The applicant's system for disambiguating speech input comprises, in part, of a 
disambiguation component that presents two or more alternatives to a user in voice mode, 
visual mode, or a combination of voice mode and visual mode, and receives an 
alternative selected by the user in voice mode, visual mode, or a combination of voice 
mode and visual mode (see paragraph [0028]). However, neither Lai nor Haddock teach 
selection of the alternatives by the user using a combination of voice mode and visual 
mode. As stated on page 2 and 3, items L to v. of the office action and Figure 2, Lai 
suggests that an acoustic signal may be inputted to the speech recognizer 190 and the 
system may display words with confidence level indicated. However, Lai does not 
expressly teach that the preferences are provided to the user to select in voice mode or 
visual mode or a combination of voice mode and visual mode, but instead suggests that 
the input to the speech recognizer 190 is in voice mode and preferences are provided to 
the user in visual mode. Hence, there is no teaching, suggestion or motivation in Lai and 
Haddock of the following limitations recited in applicant's claims 5, 12 and 13: 

"wherein the disambiguation components present the alternatives to the user in a 
visual form and allow the user to select from among the alternatives using a voice 
input' ' of claim 5, 

"where the interaction comprises the concurrent use of said visual mode and said 
voice mode" of claim 12, and 

'"Wherein the interaction comprises the user selecting from among the plural 
alternatives using a combination of speech and visual-based input** of claim 13, 

The disambiguation components such as the output generator and the input 
handler in the present invention (see paragraphs [0027] and [0028]) allow multimodal 
interaction including voice mode interaction, visual mode interaction, or a combination of 
voice mode and visual mode for a user with the disambiguation mechanism for the 
purposes of disambiguating speech input in case of an ambiguous speech input 
recognition. Claims 5, 12, and 13 (by virtue of their dependence on claims 1 and 1 1) 
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recite different instances of multimodal interaction of the user with the system for 
disambiguating speech input using the disambiguation components. 

Common sense dictates thai a person of ordinary skill in the art, at the time the 
invention was made, would not combine the method of indicating the level of confidence 
the system has in its speech recognition as described in Lai and the method for 
disambiguating natural language queries using referential input by a user as described in 
Haddock, to arrive at the claimed invention because Lai and/or Haddock show no 
recognition or appreciation of the following limitations recited in claims 5, 12, and 13: 

"wherein the disambiguation components present the alternatives to the user in a 
visual form and allow the user to select from among the alternatives using a voice 
input" of claim 5 7 

"where the interaction comprises the concurrent use of said visual mode and said 
voice mode" of claim 12, and 

"wherein the interaction comprises the user selecting from among the plural 
alternatives using a combination of speech and visual-based input" of claim 13. 

Furthermore, a secondary consideration of non-obviousness of applicant's 
invention is the option provided to the user for selecting the correct uttered word from a 
plurality of alternate words, if the speech disambiguation system fails to recognize the 
correct uttered word. Moreover, the speech disambiguation system enables the user to 
select the coirect word in visual mode, voice mode or a combination of voice and visual 
mode. Since the applicants invention offers wider choice and flexibility to the user for 
selecting the correct uttered word while using a speech disambiguation system, it is more 
likely to be commercially successful. 

Another secondary consideration of non-obviousness of applicant's invention is 
that the existing art enables a user to set initial parameters for automatic speech 
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recognition based on user preferences. However, the existing art foils to provide an 
option to set parameters based on the end application and requires resetting of the 
parameters each time based on the application- Hence, there is a long felt but unsolved 
need for an automatic speech recognition system that is initialized based on the user 
preferences as well as the end application. 

In contrast to the above indicia of non-obviousness, Duan, Lai, and Haddock fail 
to suggest or implement a method of automatic speech recognition with parameter setting 
based on user preferences and the end application. 

For the reasons stated above, applicant respectfully submits that claims 5, 12, and 
1 3 are not obvious over the cited references, and applicant solicits reconsideration of the 
rejection and allowance of claims 5, 12, and 13. 

Conclusion 

Applicant respectfully requests that a timely Notice of Allowance be issued in this 
case. If, in the opinion of Examiner Rider a telephone conference would expedite the 
prosecution of this application, Examiner Rider is requested to call the undersigned. 

Respectfully submitted, 



Date: July 21,2008 Ashok Tankba, Esq, 

Attorney For Applicant 
Reg. No. 33,802 
Phone: 856-266-5145 

Correspondence Address 
36 Greenleigh Drive 
Sewell, NJ 08080 
Fax: 856-374-0246 
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